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On  the  compression  of  low  rank  matrices 

H.  Cheng,  Z.  Gimbutas,  P,G.  Martinsson,  V.  Rokhlin 


Abstract:  A  procedure  is  reported  for  the  compression  of  rank-deficient 
matrices.  A  matrix  A  of  rank  k  is  represented  in  the  form  A  =  UoBoV, 
where  B  is  a  k  x  k  submatrix  of  A,  and  U,  V  are  well-conditioned 
matrices  that  each  contain  a  k  x  k  identity  submatrix.  This  property 
enables  such  compression  schemes  to  be  used  in  certain  situations  where 
the  SVD  cannot  be  used  efficiently.  Numerical  examples  are  presented. 

1.  Introduction 

In  computational  physics  (and  many  other  areas),  one  often  encounters  matrices  whose  ranks 
are  (to  high  precision)  much  lower  than  their  dimensionalities;  even  more  frequently,  one  is  con¬ 
fronted  with  matrices  posessing  large  submatrices  that  are  of  low  rank.  An  obvious  source  of  such 
matrices  is  the  potential  theory,  where  discretization  of  integral  equations  almost  always  results  in 
matrices  of  this  type.  Such  matrices  are  also  encountered  in  fluid  dynamics,  numerical  simulation 
of  electromagnetic  phenomena,  structural  mechanics,  multivariate  statistics  etc.  In  such  cases,  one 
is  tempted  to  “compress”  the  matrices  in  question,  so  that  they  could  be  efficiently  applied  to  arbi¬ 
trary  vectors;  compression  also  facilitates  the  storage  and  any  other  manipulation  of  such  matrices 
that  might  be  desirable. 

At  this  time,  several  classes  of  algorithms  exist  that  use  this  observation.  The  so-called  Fast 
Multipole  Methods  (FMMs)  are  algorithms  for  the  application  of  certain  classes  of  matrices  to 
arbitrary  vectors;  FMMs  tend  to  be  extremely  efficient,  but  are  only  applicable  to  very  narrow 
classes  of  operators  (see  [7]).  Another  approach  to  the  compression  of  operators  is  based  on  the 
wavelets  and  related  structures  (see,  for  example,  [3,  2]);  these  schemes  exploit  the  smoothness 
of  the  elements  of  the  matrix  viewed  as  a  function  of  their  indices,  and  tend  to  fail  for  highly 
oscillatory  operators. 

Finally,  there  is  a  class  of  compression  schemes  that  are  based  purely  on  linear  algebra,  and  are 
completely  insensitive  the  the  analytical  origin  of  the  operator.  It  consists  of  the  Singular  Value 
Decomposition  (SVD),  the  so-called  QR  and  QLP  factorizations  [8],  and  several  others.  Given  an 
m  x  n- matrix  A  of  rank  k  <  min(m,  n),  the  SVD  represents  A  in  the  form 

(1.1)  A  =  UoDoV , 

with  U  an  m  x  k,  matrix  whose  columns  are  orthonormal,  V  a  k  x  n  matrix  whose  rows  are 
orthonormal,  and  D  a  diagonal  matrix  whose  diagonal  elements  are  positive.  The  compression 
provided  by  the  SVD  is  perfect  in  terms  of  accuracy  (see,  for  example,  [5]),  and  has  a  simple 
geometric  interpretation:  it  expresses  each  of  the  columns  of  A  as  a  linear  combination  of  the  k 
(orthonormal)  columns  of  U;  it  also  represents  the  rows  of  A  as  linear  combinations  of  (orthonormal) 
rows  of  V ;  and  the  matrices  U,  V  are  chosen  in  such  a  manner  that  the  rows  of  U  are  images  (up 
to  a  scaling)  under  A  of  the  columns  of  V. 

In  this  paper,  we  propose  a  different  matrix  decomposition.  Specifically,  we  represent  the  matrix 
A  described  above  in  the  form 


A  =  UoBoV, 

l 


(1.2) 


where  B  is  a  k  x  /c-submatrix  of  A,  and  the  norms  of  the  matrices  U,  V  (of  dimensionalities  n  x  k, 
k  x  m  respectively)  are  reasonably  close  to  1  (see  Theorem  3  in  Section  3  below).  Furthermore, 
each  of  the  matrices  U ,  V  contains  a  unity  k  x  k  submatrix. 

Like  (1.1),  the  representation  (1.2)  has  a  simple  geometric  interpretation:  it  expresses  each  of 
the  columns  of  A  as  a  linear  combination  of  k  selected  columns  of  A,  and  each  of  the  rows  of  A  as 
a  linear  combination  of  k  selected  rows  of  A.  This  selection  defines  a  k  x  k  submatrix  B  of  A,  and 
in  the  resulting  system  of  coordinates,  the  action  of  A  is  represented  by  the  action  of  its  submatrix 
B. 

The  representation  (1.2)  has  the  advantage  that  the  bases  used  for  the  representation  of  the 
mapping  A  consists  of  the  columns  and  rows  of  A,  while  each  of  the  elements  of  the  bases  in  the 
representation  (1.1)  is  itself  a  linear  combination  of  all  rows  (or  columns)  of  the  matrix  A.  In 
Section  5,  we  illustrate  the  advantages  of  the  representation  (1.2)  by  constructing  an  accelerated 
direct  solver  for  integral  equations  of  potential  theory. 

Another  advantage  of  the  representation  (1.2)  is  that  the  numerical  procedure  for  constructing 
it  is  considerably  less  expensive  than  that  for  the  construction  of  the  SVD  (see  Section  4),  and  that 
the  cost  of  applying  (1.2)  to  an  arbitrary  vector  is 

(1.3)  (n  +  m  —  k)  •  k, 
vs. 

(1.4)  (n  +  m)-k 
for  the  SVD. 

The  obvious  disadvantage  of  (1.2)  vis-a-vis  (1.1)  is  the  fact  that  the  norms  of  the  the  matrices 
U,V  are  somewhat  greater  than  1,  leading  to  some  (though  minor)  loss  of  accuracy.  Another 
disadvantage  of  the  proposed  factorization  is  its  non-uniqueness;  in  this  respect  it  is  similar  to  the 
pivoted  QR  factorization. 

Remark  1.  In  (1.2),  the  submatrix  B  of  the  matrix  A  is  defined  as  the  intersection  of  k  columns 
with  k  rows.  Denoting  the  sequence  numbers  of  the  rows  by  *2>  --->**  and  the  sequence  numbers 
of  the  columns  by  ji,j2,  ■■■  ,jk,  we  will  be  referring  to  the  submatrix  B  of  A  as  the  skeleton  of  A, 
to  the  kxn  matrix  consisting  of  the  rows  of  A  numbered  ii,  *2»  •  •  •  >  *fc  as  the  row  skeleton  of  A,  and 
to  the  m  x  k  matrix  consisting  of  the  columns  of  A  numbered  ■  ■  ■  ,jk  as  the  column  skeleton 
of  A. 

The  structure  of  this  paper  is  as  follows.  Section  2  below  summarizes  several  facts  from  numerical 
linear  algebra  to  be  used  in  the  remainder  of  the  paper.  In  Section  3,  we  prove  the  existence  of 
a  stable  factorization  of  the  form  (1.2).  In  Section  4,  we  describe  a  reasonably  efficient  numerical 
algorithm  for  constructing  such  a  factorization.  In  Section  5,  we  illustrate  how  the  geometric 
properties  of  the  factorization  (1.2)  can  be  utilized  to  construct  an  accelerated  direct  solver  for 
integral  equations  of  potential  theory.  In  Section  6,  we  present  the  results  of  numerical  experiments 
with  the  direct  solver.  Finally,  Section  7  contains  a  discussion  of  other  possible  applications  of  the 
techniques  of  this  paper. 


2.  Preliminaries 

In  this  section  we  introduce  our  notation  and  summarize  several  facts  from  numerical  linear 
algebra;  these  can  all  be  found  in  [1], 
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Throughout  the  paper,  we  use  upper  case  letters  for  matrices  and  lower  case  letters  for  vectors 
and  scalars.  We  reserve  Q  for  matrices  that  have  orthonormal  columns  and  P  for  permutation 
matrices.  The  canonical  unit  vectors  in  Cn  are  denoted  by  e3.  Given  a  matrix  X,  we  let  X*  denote 
its  adjoint  (the  complex  conjugate  transpose),  Gk{X)  its  k- th  singular  value,  ||xj|2  its  l2- norm  and 
||X||f  its  Frobenius  norm.  Finally,  given  matrices  A,  B,  C  and  D  we  let 


(2.1) 


M  b]  . 


and 


'  A  B  ‘ 

C  D  ’ 


denote  larger  matrices  obtained  by  stringing  the  blocks  A,  B,  C  and  D  together. 

The  first  result  that  we  present  asserts  that  given  any  matrix  A,  it  is  possible  to  reorder  its 
columns  to  form  a  matrix  AP,  where  P  is  a  permutation  matrix,  with  the  following  property: 
When  AP  is  factorized  into  an  orthonormal  matrix  Q  and  an  upper  triangular  matrix  R,  so  that 
AP  =  QR,  then  the  singular  values  of  the  leading  k  x  k  submatrix  of  R  are  reasonably  good 
approximations  of  the  first  k  singular  values  of  A.  The  theorem  also  says  that  the  first  k  columns 
of  AP  form  a  well-conditioned  basis  for  the  column  space  of  A  to  within  accuracy  ak+i(A). 


Theorem  1.  [Gu  &  Eisenstat]  Suppose  that  A  is  an  m  x  n  matrix,  l  =  min(m,  n),  and  k  is  an 
integer  such  that  1  <  k  <  l.  Then  there  exists  a  factorization 

(2.2)  AP  =  QR, 


where  P  is  an  n  x  n  permutation  matrix,  Q  is  an  m  x  l  matrix  with  orthonormal  columns,  and  R 
is  an  l  x  n  upper  triangular  matrix.  Furthermore,  splitting  Q  and  R, 


(2.3) 


Q11 

Q12 

.  Q21 

Q22 

and 


Ru 

R\2 

0 

i?22 

in  such  a  fashion  that  Qn  and  i?u  are  of  size  kxk,  Q21  is  ( m  —  k)x  k,  Qu  is  k  x  (l  —  k),  Q22  is 
(m  —  k)  x  (l  —  k),  R\2  is  kx  (n  —  k)  and  R22  is  ( l  —  k)x(n  —  k ),  results  in  the  following  inequalities: 


(2.4) 


<Jk(Rn)  >  0k(A) 


1 

^/1  -I-  k(n  —  k)  ’ 


(2.5)  a  1(^22)  <  Vk+i(A)y/l  +  k(n  -  k), 

and 

(2-6)  HfljT,1**  <  VHn-k). 


Remark  2.  In  this  paper  we  do  not  use  the  full  power  of  Theorem  1  since  we  axe  only  concerned 
with  the  case  of  very  small  e  =  ak+i(A).  In  this  case,  the  inequality  (2.5)  implies  that  A  can  be 
well  approximated  by  a  low-rank  matrix.  In  particular,  (2.5)  implies  that 


(2.7) 


#12]  -P*||2  <  £\A  +  ~  &)• 


Furthermore,  the  inequality  (2.6)  in  this  case  implies  that  the  first  k  columns  of  AP  form  a  well- 
conditioned  basis  for  the  entire  column  space  of  A  (within  accuracy  e) . 


While  Theorem  1  asserts  the  existence  of  a  factorization  (2.2)  with  the  properties  (2.4),  (2.5), 
(2.6),  it  says  nothing  about  the  cost  of  constructing  such  a  factorization  numerically.  The  following 
theorem  asserts  that  a  factorization  that  satisfies  bounds  that  are  weaker  than  (2.4),  (2.5),  (2.6) 
by  a  factor  of  -Jn  can  be  computed  in  0(mn2)  operations. 
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Theorem  2.  [Gu  &  Eisenstat]  Given  an  m  x  n  matrix  A,  a  factorization  of  the  form  (2.2)  that 
instead  of  (2.4),  (2.5)  and  (2.6)  satisfies  the  inequalities 

(2’8)  atiRu)  5  Vl+nlln  -k)n{A)- 


(2.9) 
and 

(2.10) 


a\(B.22)  <  \/l  +  nk(n  -  k)ak+y(A), 


RnlRn\\F  <  \/nk(n  -  k), 


can  be  computed  in  0(mn2)  operations. 


3.  Analytical  apparatus 

In  this  section  we  prove  that  the  factorization  (1.2)  exists  by  applying  Theorem  1  to  both  the 
columns  and  the  rows  of  the  matrix  A.  Theorem  2  then  guarantees  that  the  factorization  can  be 
computed  efficiently. 

The  following  theorem  is  the  principal  analytic  tool  of  this  paper. 


Theorem  3.  Suppose  that  A  is  an  m  x  n  matrix  and  let  k  be  such  that  1  <  k  <  min(m,n).  Then 
there  exists  a  factorization 


(3.1) 


A  =  Ph 


As  [l\T]P(i  +  X, 


where  I  G  <Ckxk  is  the  identity  matrix,  Pi  and  Pr  are  permutation  matrices,  and  As  is  the  top  left 
k  x  k  submatrix  of  P( A  Pr.  In  (3.1),  the  matrices  S  G  C^m~k^xk  and  T  G  Cfcx(n-A0  satisfy  the 
inequalities 

(3.2)  ||S||F  <  yjk{m  -k),  and  ||T||f  <  \A(n  -  k ), 
and  the  matrix  X  is  small  if  the  ( k  +  1  )-th  singular  value  of  A  is  small, 

(3.3)  \\X\\2  <  ak+i{A)y/l  +  fc(min(m, n)  -  k). 


Proof:  The  proof  consists  of  two  steps.  First  Theorem  1  is  invoked  to  assert  the  existence  of  k 
columns  of  A  that  form  a  well-conditioned  basis  for  the  column  space  within  accuracy  ak+\{A)] 
these  are  collected  in  the  rn  x  k  matrix  Acs-  Then  Theorem  1  is  invoked  again  to  prove  that  k 
of  the  rows  of  Acs  form  a  well-conditioned  basis  for  its  row  space.  Without  loss  of  generality,  we 
assume  that  m>n  and  that  crk{A)  A  0. 

For  the  first  step  we  factor  A  into  matrices  Q  and  R  as  specified  by  Theorem  1,  letting  Pr  denote 
the  permutation  matrix.  Splitting  Q  and  R  into  submatrices  Qij  and  Rij  as  in  (2.3),  we  reorganize 
the  factorization  (2.2)  as  follows, 


(3.4)  APr  = 


Q  n 
Q21 


[Pn|  P12]  + 


Q 12 
Q22 


[o|P22] 


Q11P11 

Q21P11 


[/|  RJR12]  + 


■  0 

Q12R22 

0 

Q22R22  . 

We  now  define  the  matrix  T  G  Ckx^n  ^  via  the  formula 


(3.5) 


T  =  Pu]Pi2; 
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T  satisfies  the  inequality  (3.2)  by  virtue  of  (2.6).  We  define  the  matrix  X  G  Cmx"  via  the  formula 


(3.6) 


X  = 


'  0 

Q12R22 

0 

Q11R22 

PI 


which  satisfies  the  inequality  (3.3)  by  virtue  of  (2.5).  Defining  the  matrix  Acs  €  Cmxk  by 

Q11-R11 


(3.7) 


>lcs 


Q21-R11 

we  reduce  equation  (3.4)  to  the  form 

(3.8)  APr  =  Tics  (/|  T]  +  XPR. 

An  obvious  interpretation  of  (3.8)  is  that  Acs  consists  of  the  first  k  columns  of  the  matrix  AP& 
(since  the  corresponding  columns  of  XPR  are  identically  zero). 

The  second  step  of  the  proof  is  to  find  k  rows  of  Acs  forming  a  well-conditioned  basis  for  its 
row-space.  To  this  end,  we  factor  the  transpose  of  Acs  35  specified  by  Theorem  1, 


(3-9) 


A*csPl  =  Q 


Rn\Rl2 


Transposing  (3.9)  and  rearranging  the  terms  we  have 


(3.10) 


Pl-A-cs  = 


ni 


R*2  J 


Q* 


RnQ* 


[  R*n(R*i  1)-1 

Multiplying  (3.8)  by  P£  and  using  (3.10)  to  substitute  for  Pj*Acs  we  obtain 

I 


(3.11) 


P£APr  = 


-^12(^11) 


R*nQ*  [I\T]+P£XPr. 


We  now  convert  (3.11)  into  (3.1)  by  defining  the  matrices  As  G  Ckxk  and  S  G  C(n  k^xk  via  the 
formulae 

s  =  mMi)-1, 

a 


and 


(3.12)  As  =  R*nQ*, 
respectively. 

Remark  3.  While  the  definition  (3.5)  serves  its  purpose  within  the  proof  of  Theorem  3,  it  is 
somewhat  misleading.  Indeed,  it  is  more  reasonable  to  define  T  as  a  solution  of  the  equation 

(3.13)  || RnT  -  R12W2  <  ak+1(A)y/l  +  k(n-k). 

When  the  solution  is  non-unique  we  chose  a  solution  that  minimizes  ||T||f.  From  the  numerical 
point  of  view,  the  definition  (3.13)  is  much  preferable  to  (3.5)  since  it  is  almost  invariably  the  case 
that  Ru  is  highly  ill-conditioned,  if  not  outright  singular. 

Introducing  the  notation 


(3.14) 


^cs  =  Pl 


As  GC 


in  x  A; 


and  Ars  =  Ag  [l|  T]  Pr  G  C 


kxm 


we  observe  that  under  the  conditions  of  Theorem  3,  the  factorization  (3.1)  can  be  rewritten  in  the 
forms 


and 


(3.16) 


A  =  Pl 


Ars  +  X. 


The  matrix  Acs  consists  of  k  of  the  columns  of  A,  while  Ars  consists  of  k  of  the  rows.  We  refer  to 
As  as  the  skeleton  of  A,  and  to  Acs  and  Ars  as  the  column  and  row  skeletons,  respectively. 


Remark  4.  While  Theorem  3  guarantees  the  existence  of  a  well-conditioned  factorization  of  the 
form  (3.1),  it  says  nothing  about  the  cost  of  obtaining  such  a  factorization.  However,  it  follows 
immediately  from  Theorem  2  that  a  factorization  (3.1)  with  the  matrices  S ,  T,  and  X  satisfying 
the  weaker  bounds 


(3.17)  ||S||2  <  \/mk(m  -  k),  and  ||T||2  <  y/nk(n  -  k), 

and,  with  l  =  min(m,«), 

(3-18)  im|2  <  y/l  +  lk{l-k)ak+1(A)t 

can  be  constructed  at  the  cost  0(mnl). 


Observation  1.  The  relations  (3.1),  (3.15),  (3.16)  have  simple  geometric  interpretations.  Specif¬ 
ically,  (3.15)  asserts  that  for  a  matrix  A  of  rank  k,  it  is  possible  to  select  k  columns  that  form  a 
well-conditioned  basis  of  the  entire  column  space.  Let  ji, . . . ,  jk  €  {1, . . .  ,n}  denote  the  indices  of 
those  columns  and  let  Xk  —  span(ejn . . .  ,ejk)  C  Cn  (thus,  Xk  is  the  space  of  vectors  whose  only 
non-zero  coordinates  are  xjt, . . . ,  Xjk).  According  to  Theorem  3,  there  exists  an  operator 

(3-19)  Proj  :  Cn  ->  Xk, 


defined  by  the  formula 

(3.20) 

such  that  the  diagram 


Proj  =  PR 


I 

T 

3 

(3.21) 


is  commutative.  Here,  A'cs  is  the  mxn  matrix  formed  by  setting  all  columns  of  A  except  ji,.. .  ,jk 
to  zero.  Furthermore,  ax  (Proj )jak (Proj)  <  a/1  +  k(n  -  k).  Similarly,  equation  (3.16)  asserts  the 
existence  of  k  rows,  say  with  indices  i\, . . .  ,ik  6  {1, . . .  ,m},  that  form  a  well-conditioned  basis  for 
the  entire  row-space.  Setting  Yk  =  span(en , . . .  ,eik)  C  Cm,  there  exists  an  operator 

(3.22)  Eval  :  Yk  Cm, 

defined  by 


(3.23) 


Eval  =  Pi 


PI- 
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such  that  the  diagram 
(3.24) 


4 


e* 


cm 


is  commutative.  Here,  A'RS  is  the  m  x  n  matrix  formed  by  setting  all  rows  of  A  except  *1, . . .  ,i*  to 
zero.  Furthermore,  a\ (Eval)/cr/,,(Eval)  <  \/\  +  k{rn  —  k).  Finally,  the  geometric  interpretation  of 

(3.1)  is  the  combination  of  the  diagrams  (3.21)  and  (3.24), 

(3.25)  C”  — ^  Cw  • 


I’roj 


Eval 


Xk 


Here,  A's  is  the  m  x  «  matrix  formed  by  setting  all  entries  of  A,  except  those  at  the  intersection  of 
the  rows  ii,. .. ,ik  with  the  columns  j\ , . . . ,  j/. ,  to  zero. 

As  a  comparison,  we  consider  the  diagram 

(3.26)  Cn  — ^  Cm 

“uh 

Ck  Ck 

obtained  when  the  SVD  is  used  to  compress  the  matrix  A  G  Cmxn.  Here,  is  the  k  x  k  diagonal 
matrix  formed  by  the  k  largest  singular  values  of  A,  and  14  and  Uk  are  column  matrices  containing 
the  corresponding  right  and  left  singular  vectors,  respectively.  The  factorization  (3.25)  has  the 
advantage  over  (3.26)  that  the  mappings  Proj  and  Eval  leave  k  of  the  coordinates  invariant.  This 
is  gained  at  the  price  of  non-orthonormality  of  these  mappings. 


4.  Numerical  apparatus 

In  this  section,  we  present  a  simple  and  reasonably  efficient  procedure  for  computing  the  factor¬ 
ization  (3.1).  It,  has  been  extensively  tested  and  consistently  produces  factorizations  that  satisfy 
the  bounds  (3.17).  While  there  exist  matrices  for  which  this  simple  approach  will  not  work  well, 
they  appear  to  be  exceedingly  rare. 

Given  an  in  x  n  matrix  /l,  the  first  step  (out  of  four)  is  to  apply  the  pivoted  Gram-Schmidt 
process  to  its  columns.  The  process  is  halted  when  the  column  space  has  been  exhausted  to  a 
preset  accuracy  e,  leaving  a  factorization 

(4.1)  A]\  =  Q[Ru\Rl2], 

where  Pr  €  CTlXn  is  a  permutation  matrix,  Q  G  Crnxk  has  orthonormal  columns,  7?u  G  Ckxk  is 
upper  triangular',  and  Ru  €  C/,:X^"_A:). 

The  second  step  is  to  find  a  matrix  T  G  Ckx{n~k)  that  solves  the  equation 

(4.2)  RnT  =  R12 
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to  within  accuracy  e.  When  i?n  is  ill-conditioned,  there  is  a  large  set  of  solutions;  we  pick  one  for 
which  ||T||f  is  small. 

Letting  Acs  €  Cmxk  denote  the  matrix  formed  by  the  first  k  columns  of  APr,,  we  now  have  a 
factorization 

(4-3)  A  =  Acs  [/|  T]  PI 

The  third  and  the  fourth  steps  are  entirely  analogous  to  the  first  and  the  second,  but  are  concerned 
with  finding  k  rows  of  Acs  that  form  a  basis  for  its  row-space.  They  result  in  a  factorization 

(4.4)  Acs  =  Pl  -gr~  As- 

The  desired  factorization  is  now  obtained  by  inserting  (4.4)  into  (4.3): 

-4s  [/|T]P£. 

For  this  technique  to  be  successful,  it  is  crucially  important  that  the  Gram-Schmidt  factorization 
be  performed  accurately.  Modified  Gram-Schmidt  or  the  method  using  Householder  reflectors  are 
not  accurate  enough.  Instead,  we  use  a  technique  that  is  based  on  modified  Gram-Schmidt,  but 
that  at  each  step  re-orthogonalizes  the  vector  chosen  to  add  to  the  basis  before  adding  it.  In  exact 
arithmetic,  this  step  would  be  superfluous,  but  in  the  presence  of  round-off  error  it  greatly  increases 
the  quality  of  the  factorization  generated,  see  e.g.  [6]. 

5.  Application:  An  accelerated  direct  solver  for  integral  equations 

In  this  section  we  use  the  matrix  compression  technique  presented  in  Section  3  to  construct 
an  accelerated  direct  solver  for  boundary  integral  equations  with  non-oscillatory  kernels.  Upon 
discretization,  such  equations  lead  to  dense  systems  of  linear  equations,  and  iterative  methods 
combined  with  fast  matrix-vector  multiplication  techniques  are  commonly  used  to  obtain  the  so¬ 
lution.  Many  such  fast  multiplication  techniques  take  advantage  of  the  fact  that  the  off-diagonal 
blocks  of  the  discrete  system  typically  have  low  rank.  Employing  the  matrix  compression  techniques 
presented  in  Section  3,  we  use  this  low-rank  property  to  accelerate  direct,  rather  than  iterative, 
solution  techniques.  The  method  uses  no  machinery  beyond  what  is  described  in  Section  3  and  is 
applicable  to  most  integral  equations  involving  non-oscillatory  kernels. 

For  concreteness,  we  consider  the  equation 

(5.1)  u(x)  +  J^K(x,y)u(y)dy  =  f(x),  for  x  €  V, 

where  T  is  some  contour  and  K(x,  y)  is  a  non-oscillatory  kernel.  The  function  u  represents  an 
unknown  “charge”  distribution  on  F  that  is  to  be  determined  from  the  given  function  /.  The 
method  that  we  present  works  for  almost  any  contour  but  for  simplicity,  we  will  assume  that  the 

contour  consists  of  p  disjoint  pieces,  r  =  FH - hT?J,  where  all  pieces  have  similar  size  (an  example 

is  given  in  Fig.  3).  In  fact,  to  simplify  the  formulas,  we  will  for  the  most  part  set  p  =  3. 
Discretizing  each  contour  T,  using  n  points,  the  equation  (5.1)  takes  the  form 
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FIGURE  1.  Zeros  are  introduced  into  the  matrix  in  three  steps:  (a)  interaction 
between  Tj  and  the  other  contours  is  compressed,  (b)  interaction  with  T2  is  com¬ 
pressed,  (c)  interaction  with  T3  is  compressed.  The  small  black  blocks  are  of  size 
k  x  k  and  consist  of  entries  that  have  not  been  changed  beyond  permutations,  grey 
blocks  refer  to  updated  parts  and  white  blocks  are  all  zero  entries. 


where  uW  g  C"  and  f^>  €  €”  are  discrete  representations  of  the  unknown  boundary  charge 
distribution  and  the  right  hand  side  associated  with  F;,  and  €  Cnxn  is  a  dense  matrix 

representing  the  evaluation  of  a  potential  on  caused  by  a  charge  distribution  on  Tj. 

The  interaction  between  F \  and  the  rest  of  the  contour  is  governed  by  the  matrices 


(5.3)  HW  =  \m{1’2)\  M{1’3)]  €  Cnx2n,  and 


For  non-oscillatory  kernels,  these  matrices  are  typically  highly  rank-deficient.  We  let  k  denote 
upper  bound  on  their  ranks  (to  within  some  preset  level  of  accuracy  e).  By  virtue  of  (3.16), 
know  that  there  exist  k  rows  of  which  form  a  well-conditioned  basis  for  all  the  n  rows, 
other  words,  there  exists  a  well-conditioned  n  x  n  matrix  (see  Remark  6)  such  that 

(5.4)  jrWjyO)  =  [Jlgfij  +  0(e), 

z 


where  is  a  k  x  2n  matrix  formed  by  k  of  the  rows  of  and  Z  is  the  (n—  A:)  x  2 n  zero  matrix. 
There  similarly  exist  annxn  matrix  i?W  such  that 


(5.5)  F(1)R(1)  =  [v§|  Z*J  +  0(e), 

where  V^g  is  a  2n  x  k  matrix  formed  by  k  of  the  columns  of  V^.  For  simplicity,  we  will  henceforth 
assume  that  the  off-diagonal  blocks  have  exact  rank  at  most  k  and  ignore  the  error  terms. 

The  relations  (5.4)  and  (5.5)  imply  that  by  restructuring  equation  (5.2)  as  follows, 


we  introduce  large  blocks  of  zeros  in  the  matrix,  as  shown  in  Figure  1(a). 

Next,  we  compress  the  interaction  between  T2  and  the  rest  of  the  contour  to  obtain  the  matrix 
structure  shown  in  Fig.  1(b).  Repeating  the  process  with  T3,  we  obtain  the  final  structure  shown 
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%  § 


(a) 


Figure  2.  In  order  to  determine  the  R®  and  that  compress  the  interaction 
between  Tj  (shown  in  bold)  and  the  remaining  contours,  it  is  sufficient  to  consider 
only  the  interactions  between  the  contours  drawn  with  a  solid  line  in  (b). 


in  Fig.  1(c).  At  this  point,  we  have  constructed  matrices  R®  and  L®  and  formed  the  new  system 


•  lWmWrW 

i(l)jlf(l'3)R(3)  ' 

r  (rW)-v1)  i 

r  /«  1 

(5.7) 

L^MWRW 

lWMM)rW 

L(2)M(2, 3)^(3) 

= 

-LW /(2) 

IlW'mWrW 

~lWm\Wr&) 

(R(3))-V3)  J 

[' £(*)/( 3)  J 

whose  matrix  is  shown  in  Figure  1(c).  We  emphasize  that  the  k  x  k  non-zero  parts  of  the  off- 
diagonal  blocks  are  submatrices  of  the  original  n  x  n  off-diagonal  blocks.  The  parts  of  the  matrix 
that  are  shown  as  grey  in  the  figure  represent  interactions  that  are  internal  to  each  contour.  These 
n-k  degrees  of  freedom  per  contour  can  be  eliminated  by  performing  a  local,  0(n3),  operation  for 
each  contour.  This  leaves  a  dense  system  of  3  x  3  blocks,  each  of  size  kx  k.  Thus,  we  have  reduced 
the  problem  size  by  a  factor  of  n/k. 

Remark  5.  For  the  algorithm  presented  above,  the  compression  of  the  interaction  between  a  fixed 
contour  and  its  p  —  1  fellows  is  quite  costly  since  it  requires  the  construction  and  compression  of  the 
large  matrices  G  (^nx(p-i)n  ancj  y(*)  g  C(p_1)nXn.  In  the  numerical  examples  presented  below, 
this  step  is  avoided  by  constructing  matrices  and  7?W  that  satisfy  (5.4)  and  (5.5)  through  an 
entirely  local  procedure.  We  illustrate  how  this  is  done  by  considering  the  contours  in  Fig.  2(a) 
and  supposing  that  we  want  to  find  the  transforms  that  compress  the  interaction  of  the  contour  r* 
(drawn  with  a  bold  line)  with  the  remaining  ones.  This  can  be  done  by  compressing  the  interaction 
between  F,  and  an  artificial  contour  rartif  that  surrounds  F,  (as  shown  in  Fig.  2(b))  combined  with 
the  parts  of  the  other  contours  that  penetrate  it.  This  procedure  works  for  any  potential  problem 
for  which  the  Green’s  identities  hold.  The  computational  cost  for  one  compression  is  0(kn ?)  rather 
than  the  0{pkn2)  cost  for  constructing  and  compressing  the  entire  IfW  and  VW. 

To  sum  up:  The  accelerated  solver  consists  of  four  steps.  For  a  problem  involving  p  contours, 
each  of  which  is  discretized  using  n  nodes  and  having  off-diagonal  blocks  of  rank  at  most  k,  they 
are: 

(1)  The  off-diagonal  blocks  are  skeletonized  and  the  diagonal  nxn  blocks  are  updated  at  a 
cost  of  0{pkn 2)  using  the  technique  described  in  Remark  5. 
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(2)  The  n  —  k  degrees  of  freedom  that  represent  internal  interactions  for  each  contour  are 
eliminated  at  a  cost  of  0(pn3). 

(3)  The  reduced  kp  x  kp  system  is  solved  at  a  cost  of  0(k3p3). 

(4)  The  solution  of  the  original  system  is  reconstructed  from  the  solution  of  the  reduced  problem 
through  p  local  operations  at  a  cost  of  0(pn2). 

The  third  step  is  typically  the  most  expensive  one  with  an  asymptotic  cost  of  t(comp)  ~  ck3p3.  The 
cost  of  a  solution  of  the  uncompressed  equations  is  £(uncomP)  ~  cn3p3.  Consequently; 

t(uncomp)  3 

sP“d-“P  =  15S5T  ~  (*)  ■ 


Remark  6.  The  existence  of  the  matrices  and  are  direct  consequences  of  (3.16)  and 
(3.15),  respectively.  Specifically,  substituting  H ^  for  A  in  (3.16),  we  obtain 


(5.8) 


M(1)  = 


S 


rrC1) 

•"RS* 


where  is  the  k  x  2n  matrix  consisting  of  the  top  k  rows  of  P£H(l\  The  relation  (5.4)  now 
follows  from  (5.8)  by  defining 


(5.9) 


L^  = 


I 

0  ' 

~s 

/ 

Pf. 


We  note  that  the  largest  and  smallest  singular  values  of  L satisfy 


^(£(1|)<(i  +  l|S||?.)I/2, 


Thus  cond(L^^)  <  1  +  ||5||p,  which  is  of  moderate  size  according  to  Theorem  3.  The  matrix 
is  similarly  constructed  by  forming  the  column  skeleton  of  . 


Remark  7.  Equations  (5.4)  and  (5.5)  have  simple  heuristic  interpretations:  Equation  (5.4)  says 
that  it  is  possible  to  choose  k  points  on  the  contour  Ti  in  such  a  way  that  when  a  field  generated  by 
charge  distributions  on  the  rest  of  the  contour  is  known  at  those  points,  it  is  possible  to  extrapolate 
the  field  at  the  remaining  points  on  Ti  from  those  values.  Equation  (5.5)  says  that  it  is  possible  to 
choose  k  points  on  Tx  in  such  a  way  that  any  field  on  the  rest  of  the  contour  generated  by  charges 
on  Ti ,  can  be  replicated  by  placing  charges  only  on  those  k  points. 


Remark  8.  It  is  sometimes  advantageous  to  choose  the  same  k  points  when  constructing  the 
skeletons  of  H®  and  This  can  be  achieved  by  compressing  the  two  matrices  jointly,  for 

instance  by  forming  the  row  skeleton  of  [ifW  |  (yM)*j.  In  this  case  L W  =  (i?W)*.  When  this  is 
done,  the  compression  ratio  deteriorates  since  the  singular  values  of  [ifW|  (y(*))*]  decay  slower 
than  those  of  either  JfW  or  V^l\  as  is  seen  by  comparing  Figures  4  and  5. 

Remark  9.  When  the  solution  of  equation  (5.2)  is  sought  for  multiple  right-hand  sides,  the  cost 
of  the  first  solve  is  0(mnk).  Subsequent  solves  can  be  preformed  using  0(p2k2  +pn 2)  operations 
rather  than  0(p2n 2)  for  an  uncompressed  solver. 


li 


Figure  3.  The  contours  used  for  the  numerical  calculations  with  p  =  128.  Picture 
(a)  shows  the  full  contour  and  a  box  (which  is  not  part  of  the  contour)  that  indicates 
the  location  of  the  close-up  shown  in  (b). 


Remark  10.  The  direct  solver  that  we  have  presented  has  a  computational  complexity  that  scales 
cubically  with  the  problem  size  N  and  is  thus  not  a  “fast”  algorithm.  However,  by  applying  the 
techniques  presented  recursively,  it  is  possible  to  reduce  the  asymptotic  complexity  to  0(iV3/2), 
and  possibly  even  O(NlogN).  This  is  a  topic  of  current  research. 

6.  Numerical  results 

The  algorithm  described  in  Section  5  has  been  computationally  tested  on  the  second  kind  integral 
equation  obtained  by  discretizing  an  exterior  Dirichlet  boundary  value  problem  using  the  double 
layer  kernel.  The  contours  used  consisted  of  a  number  of  jagged  circles  arranged  in  a  skewed  square 
as  shown  in  Fig.  3.  The  number  of  contours  p  ranged  from  8  to  128.  For  this  problem,  n  =  200 
points  per  contour  were  required  to  obtain  a  relative  accuracy  of  e  =  10~6.  We  found  that  to  this 
level  of  accuracy,  no  or  pW  had  rank  exceeding  k  =  50.  As  an  example,  we  show  in  Fig.  4 
the  singular  values  of  the  matrices  and  pW  representing  interactions  between  the  highlighted 
contour  in  Fig.  2(a)  and  the  remaining  ones. 

The  algorithm  described  in  Section  5  was  implemented  in  FORTRAN  and  run  on  a  2.8GHz 
Pentium  IV  desktop  PC  with  512Mb  RAM.  The  CPU  times  for  a  range  of  different  problem  sizes 
are  presented  in  Table  1.  The  data  presented  supports  the  following  claims  for  the  compressed 
solver: 

•  For  large  problems,  the  CPU  time  speed-up  approaches  the  estimated  factor  of  ( n/k )3  =  64. 

•  The  reduced  memory  requirement  make  large  problems  amenable  to  direct  solution. 

Remark  11.  In  the  interest  of  simplicity,  we  forced  the  program  to  use  the  same  compression  ratio 
k/n  for  each  contour.  In  general,  it  detects  the  required  interaction  rank  of  each  contour  as  its 
interaction  matrices  are  being  compressed  and  uses  different  ranks  for  each  contour. 
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Figure  4.  Plots  of  the  singular  values  of  (a)  and  (b)  ifW  for  a  discretization 
of  the  double  layer  kernel  associated  with  the  Laplace  operator  on  the  nine  contours 
depicted  in  Fig.  2(a).  In  the  example  shown,  the  contours  were  discretized  using 
n  =  200  points,  giving  a  relative  discretization  error  of  about  10~6.  The  plots  show 
that  to  that  level  of  accuracy,  the  matrices  V W  6  C;i600x200  and  H M  g  <q200x1600 
have  numerical  rank  less  than  k  =  50  (to  accuracy  10“6). 


p 

^(uncomp) 

^(comp) 

.(comp) 

linit 

.(comp) 

‘'solve 

Error 

8 

5.6 

2.0  (4.6) 

1.6  (4.1) 

0.05 

8.1  •  10~r(1.4  •  10~Y) 

16 

50 

4.1  (16.4) 

3.1  (15.5) 

0.4 

2.9  •  10"6(2.8  •  10~7) 

32 

451 

13.0  (72.1) 

6.4  (65.3) 

5.5 

4.4  •  10-6(4.4  •  10-7) 

64 

3700 

65  (270) 

14  (220) 

48 

— 

128 

30000 

480  (1400) 

31  (960) 

440 

— 

Table  1.  CPU  times  in  seconds  for  solving  (5.2).  p  is  the  number  of  contours. 
t(uncomp)  is  the  CPU  time  required  to  solve  the  uncompressed  equations;  the  numbers 
in  italics  are  estimated  since  these  problems  did  not  fit  in  RAM.  t(com p)  is  the 
CPU  time  to  solve  the  equations  using  the  compression  method;  this  time  is  split 
between  the  time  to  compress  the  equations,  and  t^™p\  the  time  to  solve 

the  reduced  system  of  equations.  The  error  is  the  relative  error  incurred  by  the 
compression  measured  in  the  maximum  norm  when  the  right  hand  side  is  a  vector 
of  ones.  Throughout  the  table,  the  numbers  in  parenthesis  refer  to  numbers  obtained 
when  the  technique  of  Remark  5  is  not  used. 


7.  Conclusions 

We  have  described  a  “compression”  scheme  for  low-rank  matrices.  For  a  matrix  A  of  dimen¬ 
sionality  m  x  n  and  rank  k,  the  factorization  can  be  applied  to  an  arbitrary  vector  for  the  cost  of 
( n  +  m-k)-k  operations,  after  a  significant  initial  factorization  cost;  this  is  marginally  faster  than 
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Figure  5.  Plot  of  the  singular  values  of  X®  =  [H^  |  (V«)*]  where  JfW  and 
are  as  in  Figure  4.  The  numerical  rank  of  is  approximately  80,  which  is  larger 
than  the  individual  ranks  of  and  VM. 


the  cost  (n  +  m)  •  k  produced  by  the  SVD.  The  factorization  cost  is  roughly  the  same  as  that  for 
the  rank-revealing  QR  decomposition  of  A. 

A  more  important  advantage  of  the  proposed  decomposition  is  the  fact  that  it  expresses  all  of 
the  columns  of  A  as  linear  combinations  of  k  appropriately  selected  columns  of  A,  and  all  of  the 
rows  of  A  as  linear  combinations  of  k  appropriately  selected  rows  of  A.  Since  each  of  the  basis 
vectors  (both  row  and  column)  produced  by  the  SVD  (or  any  other  classical  factorizations)  is  a 
linear  combination  of  all  rows  (columns)  of  A,  the  decomposition  we  propose  is  considerably  easier 
to  manipulate;  we  illustrate  this  point  by  constructing  an  accelerated  scheme  for  the  direct  solution 
of  integral  equations  of  potential  theory  in  the  plane. 

A  related  advantage  of  the  proposed  decomposition  is  the  fact  that  one  frequently  encounters 
collections  of  matrices  such  that  the  same  selection  of  rows  and  columns  can  be  used  for  each  matrix 
to  span  its  row  and  column  space  (in  other  words,  there  exist  fixed  Pi  and  Pr  such  that  each  matrix 
in  the  collection  has  a  decomposition  (3.1)  with  small  matrices  S  and  T).  Once  one  matrix  in  such 
a  collection  has  been  factorized,  the  decomposition  of  the  remaining  ones  is  considerably  simplified 
since  the  skeleton  of  the  first  can  be  reused.  If  it  should  happen  that  the  skeleton  of  the  first  matrix 
that  was  decomposed  is  not  a  good  choice  for  some  other  matrix,  this  is  easily  detected  (since  then 
no  small  matrices  S  and  T  can  be  computed)  and  the  global  skeleton  can  be  extended  as  necessary. 

We  have  constructed  several  other  numerical  procedures  using  the  approach  described  in  this 
paper.  In  particular,  a  code  has  been  designed  for  the  (reasonably)  rapid  solution  of  scattering  prob¬ 
lems  in  the  plane  based  on  the  direct  (as  opposed  to  iterative)  solution  of  the  Lippman-Schwinger 
equation;  the  scheme  utilizes  the  same  idea  as  that  used  in  [4],  and  has  the  same  asymptotic  CPU 
time  estimate  0(N 3/2)  for  a  square  region  discretized  into  N  nodes.  However,  the  CPU  timps 
obtained  by  us  are  a  significant  improvement  on  these  reported  in  [4];  the  paper  reporting  this 
work  is  in  preparation. 

It  also  appears  to  be  possible  to  utilize  the  techniques  of  this  paper  to  construct  an  order 
0(N  log  N)  (or  possibly  even  order  order  O(N)  (!))  scheme  for  the  solution  of  elliptic  PDEs  in 
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both  two  and  three  dimensions,  provided  that  the  associated  Green’s  function  is  not  oscillatory. 
This  work  is  in  progress,  and  if  successful  will  be  reported  at  a  later  date. 

References 

[1]  Ming  Gu  and  Stanley  C.  Eiseustat,  Efficient,  algorithms  for  computing  a  strong  rank-revealing  QR  factorization, 
SIAM  J.  Sci.  Comput.  17  (1996),  no.  4,  848-869. 

[2]  B.  Alpert,  G.  Beylkin,  R.  Coifman,  V.  Rokhlin,  Wavelet-like  bases  for  the  fast  solution  of  second-kind  integral 
equations,  SIAM  J.  Sci.  Comput.,  vol.  14,  pp.  159-184,  1993. 

[3]  G.  Beylkin,  R.  Coifman,  and  V.  Roklilin,  Fast  wavelet  transforms  and  numerical  algorithms  I,  Communications 
on  Pure  and  Applied  Mathematics,  14:141-183  (1991). 

[4]  Yu  Chen,  Fast  direct  solver  for  the  Lippmann-Schwinger  equation,  Advances  in  Computational  Mathematics, 
vol.  16,  pp.  175-190,  2002. 

[5]  G.H.  Golub,  C.F.  Van  Loan,  Matrix  Computations,  Johns  Hopkins  University  Press,  1989. 

[6]  A  Bjorck,  Numerics  of  Gram-Schmidt  orthogonalization,  Linear  Algebra  Appl.,  vol.  197/198,  pp.  297-316,  1994. 

[7]  G. Beylkin,  On  multiresolution  methods  in  numerical  analysis,  Documenta  Mathematica,  Extra  Volume  ICM 
1998,  III,  pp.  481-490,  1998. 

[8]  G.W.  Stewart,  Matrix  Algorithms,  Vol.  I,  SIAM,  Philadelphia  1998. 


15 


